A Language-Sensitive Text Editor for Dutch

نویسندگان

Gerard Kempen

Theo Vosse

چکیده

Modern word processors begin to offer a range of facilities for spelling, grammar and style checking in English. For the Dutch language hardly anything is available as yet. Many commercial word processing packages do include a hyphenation routine and a lexicon-based spelling checker but the practical usefulness of these tools is limited due to certain properties of Dutch orthography, as we will explain below. In this chapter we describe a text editor which incorporates a great deal of lexical, morphological and syntactic knowledge of Dutch and monitors the orthographical quality of Dutch texts. Section 1 deals with those aspects of Dutch orthography which pose problems to human authors as well as to computational language sensitive text editing tools. In section 2 we describe the design and the implementation of the text editor we have built. Section 3 is mainly devoted to a provisional evaluation of the system. 1. Some Dutch spelling problems The three hardest problems of Dutch orthography are easily explained in terms of a comparison with English, French and German: 1. As in English, there are quite a few homophonous spelling patterns whose occurrence in specific words is difficult to remember. For instance, the sound /eI/ is spelled as ei in some words but as ij in other ones. In addition, proper names often spell the same sound as y, ey or even eij. And there are four common ways of rendering the diphthong which sounds like the one in Eng. how: ou, au, ouw and auw. The consonant /k/ is spelled as either c, cc, k or kk, and as ck in many proper names. And many authors forget when to reduplicate a vowel or a consonant (e.g. double d and l in onmiddellijk, Eng. immediately). 2. As in French, various frequent inflectional suffixes are homophonous. For example, the verb gebeuren (Eng to happen) has a third-person singular present-tense form gebeurt which sounds exactly the same as the past participle gebeurd. Due to wordfinal devoicing, the endings -t and -d are both pronounced /t/. If the verb stem ends in -d (also pronounced /t/), the third-person -t suffix is obligatorily added although it cannot be heard (as in the passive auxiliary wordt, Eng is). Thus, the third person singular is orthographically distinguished from the homophonous first-person

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ویرایش‌گر متن شریف: سامانۀ ویرایش و خطایابی املایی زبان فارسی

In this paper, we will introduce an intelligent system to edit and spell check Persian texts. The goal is editing and preprocessing Persian texts for natural language processing tasks. This system is based on an expandable and engineering approach and is composed of three subsystems: Persian text editor, spell checker and stemmer. These parts interact with each other to edit texts. To do this, ...

متن کامل

Context-sensitive spellchecking for programming languages

Researchers at Arizona State University have developed a text editor that provides special support tools for people who write computer programs in various programming languages. This editor, known as the E editor, has evolved over a number of years and includes many unique features that greatly facilitate the editing of program source files. The latest addition to the E editor is a contextsensi...

متن کامل

Text Screening (Censorship) in Iran: A Historical Perspective

Censorship has a long history in Iran that has interfered with text production, i.e., original writing as well as translation. This phenomenon seems to have marked the borderline between the government and the ‘enlightened’ intellectuals throughout history in Iran. Different governments have delineated ‘redlines’ for authors and translators and dealt with these constructors of culture based on ...

متن کامل

Data Collection and IPR in Multilingual Parallel Corpora. Dutch Parallel Corpus

After three years of work the Dutch Parallel Corpus (DPC) project has reached an end. The finalized corpus is a ten-million-word high-quality sentence-aligned bidirectional parallel corpus of Dutch, English and French, with Dutch as central language. In this paper we present the corpus and try to formulate some basic data collection principles, based on the work that was carried out for the pro...

متن کامل

Localization of Text Editor using Java Programming

Software localization includes translation of short text strings appearing in user interfaces (UI) into language option. These strings are usually unrelated to the other string in the UI. For translation of UI from English language to Hindi language there are some coding schemes. In this document, one of these coding has been used for a new localized software product development in place of loc...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2011

A Language-Sensitive Text Editor for Dutch

نویسندگان

چکیده

منابع مشابه

ویرایش‌گر متن شریف: سامانۀ ویرایش و خطایابی املایی زبان فارسی

Context-sensitive spellchecking for programming languages

Text Screening (Censorship) in Iran: A Historical Perspective

Data Collection and IPR in Multilingual Parallel Corpora. Dutch Parallel Corpus

Localization of Text Editor using Java Programming

عنوان ژورنال:

اشتراک گذاری